class: center, middle, inverse, title-slide .title[ # Multiple linear regression: Interactions ] .subtitle[ ## Lecture 4 ] .author[ ### Manuel Villarreal ] .date[ ### 08/28/24 ] --- ### Recap - In the last class we ran our first multiple linear regression using age and sex as predictors of blood pressure. -- - Adding the sex variable to our model made the residuals look better when plotted as a function of the observation order (id) which was not the case for the simple linear model. -- - However, we still need to see that the residuals behave as expected (mean 0 and constant variance) as a function of other variables. --- ### Residuals as a function of age .pull-left[ ``` r # Plot residual against # its corresponding id (order) plot(x = blood$age, y = mbp$residuals, ann = FALSE, axes = FALSE, cex = 2, pch = 21, col = "white", bg = "#f8971f") box(bty = "l") axis(side = 1, cex.axis = 1.5) axis(side = 2, cex.axis = 1.5, las = 2) mtext(text = "Residuals", side = 2, line = 2.5, cex = 2) mtext(text = "age", side = 1, line = 3, cex = 2) # what should we expect? abline(h = 0, col = "#005f86", lty = 4, lwd = 3.5) ``` ] .pull-right[ <img src="data:image/png;base64,#lecture-04_files/figure-html/residual-age-out-1.png" width="504" style="display: block; margin: auto;" /> ] --- ### Residuals of a function of age .panelset[ .panel[.panel-name[Residuals] - When plotting the residuals as a function of age we can't see anything wrong immediately. - However, we can still stratify our residuals by sex. ] .panel[.panel-name[Females] .pull-left[ ``` r par(mar = c(4.2,4.2,1,1)) plot(x = blood$age[which(blood$sex == "female")], y = mbp$residuals[which(blood$sex == "female")], xlim = c(37, 65), ylim = c(-10, 10), ann = FALSE, axes = FALSE, cex = 2, pch = 21, col = "white", bg = "#f8971f") box(bty = "l") axis(side = 1, cex.axis = 1.5) axis(side = 2, cex.axis = 1.5, las = 2) mtext(text = "Residuals", side = 2, line = 2.5, cex = 2) mtext(text = "Age", side = 1, line = 3, cex = 2) # what should we expect? abline(h = 0, col = "#005f86", lty = 4, lwd = 3.5) ``` ] .pull-right[ <img src="data:image/png;base64,#lecture-04_files/figure-html/residual-f-age-out-1.png" width="504" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Males] .pull-left[ ``` r par(mar = c(4.2,4.2,1,1)) plot(x = blood$age[which(blood$sex == "male")], y = mbp$residuals[which(blood$sex == "male")], xlim = c(37, 65), ylim = c(-10, 10), ann = FALSE, axes = FALSE, cex = 2, pch = 21, col = "white", bg = "#003f5c") box(bty = "l") axis(side = 1, cex.axis = 1.5) axis(side = 2, cex.axis = 1.5, las = 2) mtext(text = "Residuals", side = 2, line = 2.5, cex = 2) mtext(text = "Age", side = 1, line = 3, cex = 2) # what should we expect? abline(h = 0, col = "#005f86", lty = 4, lwd = 3.5) ``` ] .pull-right[ <img src="data:image/png;base64,#lecture-04_files/figure-html/residual-m-age-out-1.png" width="504" style="display: block; margin: auto;" /> ] ] ] --- ### Stratified residuals - Once we look at the residuals of the model independently for each of our two groups we can see that there's still a small bias in our model. -- - For example, we noticed that on average our residuals for the female group tend to increase as a function of age. -- - While the residuals for the male group tend to decrease with age. -- - These type of results suggest to us that there might be an interaction between the two variables. --- ### Interactions - From the previous plots we can see that, once we stratify the residuals by sex, we can see a trend in the residuals: -- - Residuals for female participants increase as a function of age. -- - Residuals for male participants decrease as a function of age. -- - This suggests that there might be an interaction between sex and age that could be important for the model. --- ### Adding interactions to an MLR - Formally, we can express the interaction model as: `$$\text{blood pressure}_i = \beta_0 + \beta_1\text{age}_i + \beta_2\text{sex}_i + \beta_3\text{age}_i \text{sex}_i + \epsilon_i$$` -- - In R we can add interactions to a linear model in two different ways: ``` r mlr_int_1 <- lm(formula = blood_pressure ~ age + sex + age * sex, data = blood) mlr_int_2 <- lm(formula = blood_pressure ~ age:sex, data = blood) ``` -- - Just like before, the output of the function includes useful elements that we can use to evaluate the model. --- ### Adding interactions to an MLR - An important thing to note is that the `formula` notation that includes the colon: ``` r mlr_int_2 <- lm(formula = blood_pressure ~ age:sex, data = blood) ``` has a different interpretation of the estimated values in comparison to the notation that has the sums and products ``` r mlr_int_1 <- lm(formula = blood_pressure ~ age + sex + age * sex, data = blood) ``` -- - The reason for this difference is that the model produced by the **colon** notation model does not include a specific coefficient for age, and instead has one age coefficient (parameter) for each group. --- ### Residual plots .panelset[ .panel[.panel-name[Interactions] .pull-left[ ``` r par(mar = c(4.2,4.2,1,1)) curve(expr = mbp$coefficients[1] + mbp$coefficients[2] * x, lwd = 2, from = 36, to = 69, ylim = c(115, 145), col = "#f8971f66", lty = 2, las = 1, xlab = "Age", ylab = "Blood pressure", cex = 1.5) curve(expr = mlr_int_1$coefficients[1] + mlr_int_1$coefficients[2] * x, lwd = 3, from = 36, to = 69, add = TRUE, col = "#f8971f") curve(expr = (mbp$coefficients[1] + mbp$coefficients[3]) + mbp$coefficients[2] * x, lwd = 2, from = 36, to = 69, add = TRUE, col = "#003f5c66", lty = 2) curve(expr = (mlr_int_1$coefficients[1] + mlr_int_1$coefficients[3]) + (mlr_int_1$coefficients[2] + mlr_int_1$coefficients[4]) * x, lwd = 3, from = 36, to = 69, add = TRUE, col = "#003f5c") ``` ] .pull-right[ <img src="data:image/png;base64,#lecture-04_files/figure-html/line-plot-out-1.png" width="504" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Histogram] .pull-left[ ``` r par(mar = c(4.2,4.2,1,1)) # Then we can draw the histogram hist(x = mlr_int_2$residuals, breaks = 30, axes = FALSE, ann = FALSE, freq = FALSE, border = "white", col = "#f8971f") abline(v = mean(mlr_int_1$residuals), lty = 2, lwd = 3, col = "#005f86") box(bty = "l") axis(side = 1, cex.axis = 1.5) mtext(side = 1, cex = 1.6, line = 3, text = expression( x = paste("Residuals: ", hat(epsilon)[i]))) ``` ] .pull-right[ <img src="data:image/png;base64,#lecture-04_files/figure-html/err-hist-out-1.png" width="504" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Females] .pull-left[ ``` r par(mar = c(4.2,4.2,1,1)) plot(x = blood$age[which(blood$sex == "female")], y = mlr_int_1$residuals[which(blood$sex == "female")], xlim = c(37, 65), ylim = c(-10, 10), ann = FALSE, axes = FALSE, cex = 2, pch = 21, col = "white", bg = "#f8971f") box(bty = "l") axis(side = 1, cex.axis = 1.5) axis(side = 2, cex.axis = 1.5, las = 2) mtext(text = "Residuals", side = 2, line = 2.5, cex = 2) mtext(text = "Age", side = 1, line = 3, cex = 2) # what should we expect? abline(h = 0, col = "#005f86", lty = 4, lwd = 3.5) ``` ] .pull-right[ <img src="data:image/png;base64,#lecture-04_files/figure-html/residual-f-int-out-1.png" width="504" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Males] .pull-left[ ``` r par(mar = c(4.2,4.2,1,1)) plot(x = blood$age[which(blood$sex == "male")], y = mlr_int_1$residuals[which(blood$sex == "male")], xlim = c(37, 65), ylim = c(-10, 10), ann = FALSE, axes = FALSE, cex = 2, pch = 21, col = "white", bg = "#003f5c") box(bty = "l") axis(side = 1, cex.axis = 1.5) axis(side = 2, cex.axis = 1.5, las = 2) mtext(text = "Residuals", side = 2, line = 2.5, cex = 2) mtext(text = "Age", side = 1, line = 3, cex = 2) # what should we expect? abline(h = 0, col = "#005f86", lty = 4, lwd = 3.5) ``` ] .pull-right[ <img src="data:image/png;base64,#lecture-04_files/figure-html/residual-m-int-out-1.png" width="504" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Auto-correlation] .pull-left[ ``` r # We can use the function built in r acf(x = mlr_int_1$residuals, las = 1, lwd = 2, main = "Series: residuals", ylab = "", xlab = "Lag", cex.lab = 1.7, cex.main = 1.5, cex.axis = 1.3) mtext(text = "Auto-correlation", side = 2, line = 2.8, cex = 1.7) ``` ] .pull-right[ <img src="data:image/png;base64,#lecture-04_files/figure-html/auto-cor-int-out-1.png" width="504" style="display: block; margin: auto;" /> ] ] ] --- ### Conclusion - Now that we have added interactions to the blood pressure model our estimated residuals look good. -- - Their mean is close to 0, they have constant variance (rectangle when plotted against other variables), and they have a low auto-correlation which suggests independence. -- - This seems to be a good model for our data but how can we know if it is? -- - Furthermore, when do we decide to stop adding covariates (the `\(X\)`'s) to our model? -- - To answer these two questions we need to do a Null Hypotheses Test (NHT). -- - This process is known as inference and it's one of the most popular approaches in statistics.